Problem/Goal:

To showcase the capabilities of ggplot2, a popular data visualization library in R, by using the penguins and diamonds datasets to create a variety of plots that demonstrate the different functions available in the library.

Data Source:

The penguins dataset contains information on the size and species of penguins, while the diamonds dataset contains information on the price, carat weight, and other features of diamonds. Both datasets will be used to demonstrate the different plotting functions available in ggplot2.

Conclusion:

The project demonstrates the flexibility and power of ggplot2 as a data visualization tool in R. The use of both the penguins and diamonds datasets allowed for the creation of a variety of plots that showcased the different functions available in the library, such as histograms, bar plots, scatter plots, and more. The results of the project highlight the effectiveness of ggplot2 for data visualization and the ease with which plots can be created and customized.

1. Install & load the ggplot2 packages along with the penguins dataset

library("ggplot2")
library("palmerpenguins")

2. View the penguins dataset

data("penguins")
View(penguins)

*here is an example of a really cool plot that can be done with ggplot2

can also add captions to the plot –‘soooo cool :)!’

3. creating a scatter plot with the Penguins dataset

 ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Alternative syntax to code for the same plot

ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()

3. Coding different aesthetics for the penguins plot

3a. Aesthetic: categorizing the species by shape

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, shape = species))

3b. Aesthetic: categorizing the species by shape & color

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, shape = species, color = species))

3c. Aesthetic: categorizing the species by shape, color, and size

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, shape = species, color = species, size = species))

3d. Aesthetic: categorizing the species by opacity

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, alpha = species))

3e. Aesthetic: assigning a color to all the points

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g), color = "purple")

4. Creating a smooth line plot with the penguins dataset

ggplot(data = penguins) +
  geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

4a. Creating a smooth line plot & scatter plot with the penguins dataset

ggplot(data = penguins) +
  geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

4b. Aesthetic: categorizing the species in the smooth line plot by line type

ggplot(data = penguins) +
  geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g, linetype = species))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

5. Using the geom_jitter() to prevent points in a scatterplot from overplapping

ggplot(data = penguins) +
  geom_jitter(mapping = aes(x = flipper_length_mm, y = body_mass_g))

6. Using the facet_grid() to categorize the sex and species subsets to individual scatter plots

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  facet_grid(sex ~ species)

7. using the facet_wrap() to categorize only the species subsets to individual scatter plots

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  facet_wrap(~species)

Loading & viewing the diamonds dataset

data("diamonds")
View(diamonds)

1. Creating a bar graph to visualize the diamonds dataset

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))

1a. Aesthetic: Classifying cut types by bar graph border color

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, color = cut))

1b. Aesthetic: Classifying cut types by bar graph fill color

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = cut))

1c. Aesthetic: Classifying clarity of the cut types by bar graph border color

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity))

1d. using the facet_wrap() to categorize the cut subsets to individual bar graphs

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = cut)) +
  facet_wrap(~cut)

Adding labels and annotations to the penguins dataset plots

1. Adding a title

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length")

2. Adding a title & subtitle

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species")

3. Adding a title, subtitle, and caption (good for citing sources)

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman")

4. Adding a title, subtitle, caption, and an annotation into the plot

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman") + 
  annotate("text",x=220, y=3500, label= "The Gentoos are the largest" )

5. Adding a title, subtitle, caption, and an annotation in a purple color into the plot

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman") + 
  annotate("text",x=220, y=3500, label= "The Gentoos are the largest", color="purple" )

6. Adding a title, subtitle, caption, and an annotation in a purple color with a bold font into the plot

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman") + 
  annotate("text",x=220, y=3500, label= "The Gentoos are the largest", color="purple", fontface="bold", size=4.5 )

7. Adding a title, subtitle, caption, and an annotation in a purple color with a bold font into the plot while adding custom orientation to the annotation

ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman") + 
  annotate("text",x=220, y=3500, label= "The Gentoos are the largest", color="purple", fontface="bold", size=4.5, angle=25 )

8. Shortcut to #7 is to assign the labeling to a variable ‘p’

p <- ggplot(data = penguins) +
  geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  labs(title = "Palmer Penguins: Body Mass vs. Flipper Length", subtitle = "Sample of Three Penguins Species", caption = "Data collected by Dr. Kristen Gorman") 

9. We can then just input the p variable along with our annotate() to add the annotation

p + annotate("text",x=220, y=3500, label= "The Gentoos are the largest", color="purple", fontface="bold", size=4.5, angle=25 )

Thank you for reading, Ibrahim